Noisy traveling waves: effect of selection on genealogies 
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For a family of models of evolving population under selection, which can be described by noisy 
traveling wave equations, the coalescence times along the genealogical tree scale like In'^ A'^, where 
A'^ is the size of the population, in contrast with neutral models for which they scale like A'^. An 
argument relating this time scale to the diffusion constant of the noisy traveling wave leads to a 
prediction for a which agrees with our simulations. An exactly soluble case gives trees with statistics 
identical to those predicted for mean-field spin glasses by Parisi's theory. 



Traveling wave equations such as the F-KPP equation 



dth = dlh + h-h^ P,ii describe how a stable medium 
h — 1 invades an unstable medium h = 0. They were first 
introduced to study how an advantageous gene propa- 
gates through a population, h being the fraction of the 
population with the advantageous gene. They also ap- 
pear in other contexts such as disordered systems l^lsLlg, 
QCD 131 , reaction-diffusion fragmentation jlfll or 

chemistry |lH . 

Traveling wave equations often represent a mean field 
picture where the fluctuations at the microscopic scale 
are ignored. The effect of these fluctuations can be 
represented H m El El El by a noise term {dth = 
d^h + h — h? + ^vi^i t)\/h — K^). Determining quantita- 
tively the effect of a weak noise (e ^ 1) on the front 
position is a subject of active research. There is in- 
creasing evidence that the dynamics of the position of 
the front is dominated by the fluctuations near its tip 
[isl llillr^ and that there is a shift in the velocity of the 
frontpnTaliglliolEllE^. logarithmic in the amplitude 
e of the noise, as predicted by a simple cut-off theory [23| . 

In the present letter, we consider models of an evolving 
population under selection, which can be described by 
noisy traveling wave equations. Instead of focusing on the 
time dependence of the position of the front, we look at 
the problem from a different perspective: we determine 
how coalescence times in the genealogy depend on the 
size of the population. Our simulations as well as a simple 
argument indicate that these coalescence times scale as 
the inverse of the diffusion constant of the front. 

We consider a population of fixed size TV with asexual 
reproduction. Each individual i is characterized by a real 
number Xi measuring its adequacy to the environment, 
and the population is fully specified by these N positions 
XiS on the adequacy axis. At each new generation, all 
the individuals disappear after giving birth to some off- 
springs. We consider the following variants: 

In Model A, each individual gives birth to k off- 
springs, and the j-th offspring of individual i is at posi- 
tion Xi+Ci^j , where the e^.j are uncorrelated random num- 
bers chosen according to some distribution p(e). Thus, 
each individual inherits its parent's adequacy, and e^.j 



accounts for the effects of mutation. Then comes the se- 
lection step: out of the kN new individuals, we only keep 
the N best ones, the ones with the highest Xi's. Typically, 
we will take k — 2 and p(e) uniform between and 1. 
A similar model was proposed recently 0, 12^ . [H 
to study the evolution under competitive selection of a 
population of DNA molecules in vitro. The population 
undergoes several cycles where, in the first part of a cy- 
cle, each molecule is amplified by a fixed number k with 
possible mutations and in the second part of the cycle 
selection acts by keeping only the best 1/fc fraction of 
these offsprings, defined as the molecules with the high- 
est binding energies to a given target. In this picture, Xi 
represents this binding energy. 

We also investigate Model A' where, instead of keep- 
ing the N best individuals at each generation, we keep 
N individuals randomly chosen among the {2>/2)N best 
ones. This allows us to check that our results remain 
unchanged under a less stringent selection. 

In Model B, each individual i has infinitely many off- 
springs, with positions distributed according to a Poisson 
point process of density il}{x — Xi) (i.e., with probabil- 
ity ip{x — Xi) dx, there is an offspring of individual i at 
position x). As in Model A, we only keep the N best 
offsprings. Here, is a positive function such that 

j 4'{e)de — oo (for the population not to disappear) and 
which decays fast enough as e — > cxd to ensure that these 
best offsprings have finite positions. Having infinitely 
many offsprings at the first step is of course unrealistic, 
but after selection, each individual has only a finite num- 
ber of offsprings. The main two reasons for considering 
Model B are to check the robustness of our results and 
to exhibit an exactly soluble case for one particular '0(e)- 

The genealogical tree of an evolving population can be 
characterized in many ways [H . Here we measure 
average coalescence times (Tp) defined as follows: Tp is 
the age of the most recent common ancestor of p individ- 
uals chosen at random at generation g, and (Tp) is the 
average of Tp over all choices of these p individuals and 
over all generations g. 

In absence of selection (for example when each individ- 
ual has k offsprings as in model A, but with N survivors 
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FIG. 1: Times (T2) versus A'^, for different models. The scale 
of the N axis is InlnAf. The data points are compared to 
several power laws of In A'' shown by the straight lines. 



chosen uniformly among these kN offsprings), these (Tp) 
grow linearly with N and their ratios take, for large iV, 
the simple values (independent of k) of the Kingman co- 
alescent 12911301 : 
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One goal of the present work is to show that the effect 
of selection changes completely Eq. the time scale of 
these coalescence times {Tp) becomes 



\nN\ 



(2) 



and the ratios are compatible with the values character- 
izing the Bolthausen-Sznitman coalescent 
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In Figs, n and [3 we show the results of simulations 
for different cases : models A and A' with k = 2 and a 
uniform density p(e) = 1 for < e < 1, and model B for 
three different choices of the density tp{e): ipi{e) = 
M^) = (-e)'^(-e) and M^) = e-^ 

Typically we simulated populations of sizes ranging 
from N = 10^ to 10^ over 10'' generations. We measured 
the times {Tp) by recording at each generation g the age 
T2{i,j) of the most recent common ancestor of individ- 
uals i and j. One then gets (T2) by averaging T2{i,j) 
over i, j and g. As the matrix T2{i,j) is ultrametric, 
no additional information is needed to compute the (Tp) : 
for instance, T^{i,j, k) = max[r2(*, j), ^2(1, k)]. For large 
sizes N , we actually took advantage of ultrametricity by 
representing the matrix T2{i,j) as a tree: at each step, 
we only kept track of the current N individuals and of all 
the most recent common ancestors of any pair of them. 
There are at most A^ — 1 such ancestors, so both memory 
and execution time grow linearly with A^, instead of N"^ 
if we were manipulating the full matrix. 



FIG. 2: Ratios {T-i) / {T2) and {T4,) / {T2) versus iV, for the 
same models (same symbols) as in Fig. The dashed lines 
represent the the neutral case Eq. Q, and the dotted lines 
correspond to Eq. 0, i.e. model B with ip{e) = e""'. 



In all cases except for model B with the exponential 
distribution, which is special as we shall see below, Fig.Q] 
indicates that the exponent a defined in Eq. Q is in the 
range 



2. < a 



mcasurcc 



< 3. 



(4) 



For model B with the exponential distribution, however, 
our data suggest a significantly smaller value .75 < a < 1. 

Fig. H shows the ratios {T^) / {T2) and {Ti) / {T2) for 
the same models as in Fig. ^ In all cases, including 
model B with '4>{e) — e^'^, these ratios take for large 
A^ values similar to Eq. which differ noticeably from 
their values in absence of selection. At present, we 
do not have a general theory to explain this numerical 
result. 

Only for model B with 'ip{t) — e"*^ (the special case), 
one can calculate these ratios exactly using a method sim- 
ilar to ^5]. If at generation the population consists of 
A^ individuals xi{g), . . . ,XN{g), the probability of having 
one of their offsprings in the interval y,y + dy is 



N 



N 



^^(y-x,(5)) dy = ^e-'(^)-^ = e^^"^ dy 



i=l 

where 



Ag=ln 



So, for model B with ilj{e) — e^', the offsprings of the 
whole population are the same as those of a single effec- 
tive individual at position Xg. This means that one can 
write Xi{g + 1) = Xg + y,, where 2/2, yn are the A 
largest values of a Poisson point process on the line with 
exponential density. The yiS are therefore distributed 
according to 

P(l/jv < yN-i < • • • < yi) = e-^y'+y-+-+y-'^-^''"' (5) 
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With this simpHfication, one can see that the differ- 
ences AXg = Xg+i — Xg 3.16 indcpcnclent identically dis- 
tributed random variables. At generation g, the N num- 
bers Xi{g) form a cloud of points which does not spread in 
time, very much like a quantum A'^-particle bound state. 
This cloud has a well defined velocity vn and diffusion 
constant D^. As the differences A.Xg are independent, 
Vn and are given by: 

VN^iAXg) ; Dn = {[AXg]') ^ {AXg)' 

where the expectations are over the distribution 10) of 
the Hi's and 



AX„ - X„ 



X„ = In [e^i + ... + : 



Calculations similar to those of 5] lead for large N to 



Vn = In In TV - 



In In AT + 1 
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We now turn to computing (Tp) in the exponential 
case. If one chooses randomly p > 2 individuals at gen- 
eration .9 + 1, one can calculate the probability Qp that 
they have the same ancestor at generation g 



1 



1 



p- 1 InA^ 



(7) 



One can also show that for large A^ and fixed p, events 
with more than one coalescence within the p individuals 
between two successive generations have a probability of 
order (In A)~^ at most. Thus, for large A^, the genealog- 
ical tree of a sample of p individuals consists of single 
coalescence events separated by times of order In A^. 

From the knowledge of the qp's, one can obtain for large 
A^ the probability rp{k) that p individuals at generation 
g + I have exactly k < p ancestors at generation g: 



k-l 



^^^j\{k-i-j)\ip-k + iy. 



{p - k){p - k + 1) In A 



and one has 



(Tp) ^l + {Tp) + Y, rp{k) [(Tfc) - {Tp)] (9) 

k<p 

Using the fact that (Ti) = this immediately gives 

(T2)~lnA^ (10) 

(in reasonable agreement with our simulations of Fig. ^ 
and all the ratios (T!„)/(T2). For n = 3 and 4, this gives 
Eq. |j3J| which is therefore asymptotically exact in the 
exponential case. 

We now return to the general case. Our models are 
branching processes which are known to be related to 



fronts of the F-KPP type |33|. Let us now see how one 
can associate to model B a noisy traveling wave equation 
(a similar calculation can be done for model A). At gen- 
eration g the whole population can be characterized by a 
function hg{x) which counts the fraction of individuals i 
such that Xi{g) > x. Obviously hg{x) has the shape of a 
front (/ig(— oo) = 1 and hg{oo) ~ 0). From the definition 
of model B, one can show that hg(x) satisfies 



hgj^\{x) = min 



1, / hg{x — t)i\){e)de + 



(11) 



where r]g{x) is some correlated noise of zero- mean with a 
variance equal to the integral appearing in Eq. (fTTT) . One 
can show as in |^ that this noise is Gaussian far from the 
tip of the front. 

In the large A^ limit, Eq. (|ll|l becomes deterministic. 
One can look for traveling wave solutions moving at a 
velocity v. In the region where h <^ 1, an exponential 
shape hg(x) ~ Aexp[— 7(2; — wg)] is a solution if 



V = f (7) = — In 

7 



o7c 



(12) 



If the front is of the F-KPP type Q , its velocity for steep 
enough initial conditions is the smallest one for which 
7 is real. Furthermore, for finite but large N ^ one ex- 
pects 0, ^3 1^ a correction to this velocity of order 
[lnA^]~^ and a diffusion constant scaling like [InA^]"'^. 
We have checked that these predictions are compatible 
with our numerical simulations for all the models that we 
tested when w(7) has a minimum value (all the models 
of Figs, n 13 except the exponential case have this prop- 
erty). For other choices such as '(/'(e) — exp(— e), however, 
v[j) — 00 for all 7, and the front is not of the F-KPP 
type. It is therefore not surprising that the genealogy of 
the exponential case is special. 

In all cases, the times (r2) or (Tp) give the order of 
magnitude of the age of the most recent common ancestor 
of the whole population and, therefore, the time scale on 
which the population looses memory about its genealogy. 
Now, the position of this most recent common ancestor 
has fluctuations of order 1, and this contributes a random 
shift of order 1 to the displacement of front. Therefore, 
the fluctuating part of the position at generation g is the 
sum of roughly g / {T2) independent random variables of 
order 1. Within this picture, the diffusion constant but 
also all the cumulants of the position would scale like 



Dn - 1/(T2) ~ l/{Tp) 



(13) 



That all cumulants have the same large A dependence 
was indeed one of the main results of our previous work 
[13. Note that Eq. (O does hold (Eqs. ® and 10) in 
the exponential case. Assuming that it remains valid in 
general, we would then predict 



Q^prcdictioii 



(14) 
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since there is now good numerical evidence |l5ll34j as well 
as analytic arguments ^3 in favor of Dat ~ [In A^]""^ in 
the generic case, i.e. if i'(7) has a finite minimum. We 
think that this prediction agrees with our measurements 
Q up to finite size corrections: in fact, for the diffusion 
constant itself, it already turned out [l^ 13 that the 
large N asymptotic regime was only observed for much 
larger systems than the ones studied here. 

Beyond the fact that the time scale is logarithmic in N, 
which is not surprising for models of evolution in presence 
of strong selection [s^ [s^ [s^l , the times (Tp) are char- 
acteristic of the statistical properties of the genalogical 
trees of samples of a few individuals. For the exponen- 
tial model, these statistics are totally specified by the fact 
that there is at most one single coalescence event at each 
generation with probability Eq. Q. Surprisingly, these 
coalescence probabilities (up to the factor IniV which 
fixes the time-scale) are the same as those which emerged 
from the theory of spin-glasses so that trees, 

in the exponential model B here, have exactly the same 
statistics as the ultrametric trees of Parisi's mean-field 
theory of spin-glasses [s^ [s^ ■ So far we have not been 
able to develop a replica approach for noisy traveling 
waves to justify this connection. There is however some 
hope to do so since noisy traveling waves appear in the 
study of directed polymers in a random medium .5], a 
system for which Parisi's theory is known to be valid at 
the mean-field level Q. 

The numerical data presented in this paper show that, 
for the class of models we considered, selection has dras- 
tic effects on the genealogies: the coalescence times be- 
come logarithmic in the population size l(2Jl instead of 
linear and the statistics of the coalesence times are mod- 
ified. The accuracy of our simulations is not sufficient 
to be sure that the exponent a and the ratios of coales- 
cence times are universal (for all models for which v{'^) 
has a finite minimum). We however gave an argument 
which supports the conjecture (|14|l . Of course, de- 
veloping an analytical approach susceptible of proving or 
disproving this universality is a challenging open prob- 
lem. Another open issue is whether thinking in terms 
of genealogies is limited to the family of selection mod- 
els discussed here or could be extended to more general 
noisy traveling wave equations. 

Lastly, it would be interesting to know what our ratios 
would become in other models of evolution with selec- 
tion such as (4^ ^] and if there is a chance of estimating 
them from experimental data on genetic diversity. 

This work was partially supported by the U.S. Depart- 
ment of Energy. 
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